Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Ali, Y.*; Ina, Takuya*; Onodera, Naoyuki; Idomura, Yasuhiro
no journal, ,
Krylov subspace solvers for the pressure Poisson equation occupy of the total computing cost in extreme scale multi-phase CFD simulation. To accelerate the Poisson solver, we port a Chebyshev Basis communication-avoiding Conjugate Gradient (CBCG) solver with block Jacobi (BJ) preconditioning on P100 GPUs. The CBCG solver consists of BJ preconditioning, Sparse Matrix Vector product (SpMV), and Tall-Skinny matrix operations. We re-design the BJ-preconditioner for thread-block parallelization and efficient coalescing data load, and apply batched gemm to the Tall-Skinny matrix operations. By these optimization, all main kernels achieved of the theoretical performance based on roofline estimation, and an order of magnitude speedup of the single node performance was obtained against CPU nodes.